Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSP parser throws errors for certain scenarios #127

Open
CodesAway opened this issue Sep 19, 2021 · 2 comments
Open

JSP parser throws errors for certain scenarios #127

CodesAway opened this issue Sep 19, 2021 · 2 comments
Assignees
Milestone

Comments

@CodesAway
Copy link
Owner

Found during work when using BEX to parse our JSP files. Below is the fix I found. I'll also add some unit tests

	public static ImmutableIntRangeMap<ParsingState> parseJSPTextStates(final CharSequence text) {
		// TODO: used Java as a basic and need to enhance
		// For example, to handle JSP Expression
		// https://www.tutorialspoint.com/jsp/jsp_syntax.htm

		// TODO: need to make RangeMap class and correctly and nested ranges
		// Currently, doesn't work as expected
		// "stuff <%= expression%> more stuff"
		// "More stuff" after the expression should be seen as part of the String literal,
		// but isn't since it gets the last range, which is the expression, which is over
		// Think can fix by end the state when go into a inner state
		// Then, when leave inner state, start a new state based on the outer state

		// TODO: make RangeMap class to handle this
		// When adding a new record, check for overlap using the below logic
		// + An overlap occurs if and only if
		// a) The added range's start in part of an existing range
		// * Can check by finding existing range in map and seeing if the added range's start is in the middle
		// * BEXUtilities.getEntryInRanges
		// b) An existing range's start is contained in the new range
		// * Can do a subRange check on the existing NavigableMap and see if there are any entries
		// If there's an overlap, handle by breaking apart ranges in pieces

		// Parse text to get states
		// * Block comment
		// * Line comment
		// * In String literal
		// * Other stuff?

		// Reference: https://www.tutorialspoint.com/jsp/jsp_syntax.htm

		ImmutableIntRangeMap.Builder<ParsingState> builder = ImmutableIntRangeMap.builder();
		ArrayDeque<ParsingState> stateStack = new ArrayDeque<>();
		ArrayDeque<Integer> startTextInfoStack = new ArrayDeque<>();
		ArrayDeque<Integer> parentStartStack = new ArrayDeque<>();

		boolean isJava = false;
		// HTML tag
		boolean isTag = false;
		// TODO: should I refactor and use this? how would I use it?
		//		String expectedEnd = "";

		for (int i = 0; i < text.length(); i++) {
			//			if (i == 50) {
			//				System.out.println("Debug");
			//			}

			char c = text.charAt(i);

			//			System.out.printf("Index %s%n"
			//					+ "Char %s%n"
			//					+ "States %s%n"
			//					+ "Start %s%n"
			//					+ "Parent %s%n", i, c, stateStack, startTextInfoStack, parentStartStack);

			ParsingState currentState = unwrapParsingState(stateStack.peek());

			//			if (currentState == null) {
			//				System.out.println("Parent: " + i);
			//			}

			if (currentState == IN_STRING_LITERAL) {
				if (c == '\\') {
					// Escape next character
					if (nextChar(text, i) == '\0') {
						break;
					}

					i++;
				} else if (c == '"') {
					popParsingState(i, builder, stateStack, startTextInfoStack, parentStartStack);
				} else if (isTag && hasText(text, i, "<%=")) {
					pushNextLevelParsingState(IN_EXPRESSION_BLOCK, i, builder, stateStack, startTextInfoStack,
							parentStartStack);
					i += 2;

					isJava = true;
				}

				// Other characters don't matter??
				// TODO: handle unicode and other escaping in String literal
			} else if (currentState == IN_SECONDARY_STRING_LITERAL) {
				if (c == '\\') {
					// Escape next character
					if (nextChar(text, i) == '\0') {
						break;
					}

					i++;
				} else if (c == '\'') {
					popParsingState(i, builder, stateStack, startTextInfoStack, parentStartStack);
				} else if (hasText(text, i, "<%=")) {
					pushNextLevelParsingState(IN_EXPRESSION_BLOCK, i, builder, stateStack, startTextInfoStack,
							parentStartStack);
					i += 2;
				}

				// Other characters don't matter??
				// TODO: handle unicode and other escaping in String literal

				// TODO: Java comments only valid in <% code block %>
			} else if (isJava && hasText(text, i, "%>")) {
				isJava = false;

				//				System.out.println("Current: " + currentState
				//						+ "\t"
				//						+ i);

				if (currentState != IN_EXPRESSION_BLOCK) {
					// End the current state on the prior character
					popParsingState(i - 1, builder, stateStack, startTextInfoStack, parentStartStack);
				}

				i++;
				popParsingState(i, builder, stateStack, startTextInfoStack, parentStartStack);
			} else if (isJava && currentState == IN_LINE_COMMENT) {
				if (c == '\n' || c == '\r') {
					popParsingState(i - 1, builder, stateStack, startTextInfoStack, parentStartStack);
					i = handleLineTerminator(i, c, text, builder, stateStack, startTextInfoStack, parentStartStack);
					//					int startTextInfo = startTextInfoStack.pop();
					//					builder.put(IntBEXRange.of(startTextInfo, i), stateStack.pop());
				}
				// Other characters don't matter?
			} else if (isJava && currentState == IN_MULTILINE_COMMENT) {
				if (hasText(text, i, "*/")) {
					i++;
					popParsingState(i, builder, stateStack, startTextInfoStack, parentStartStack);
					//					int startTextInfo = startTextInfoStack.pop();
					//					builder.put(IntBEXRange.closed(startTextInfo, i), stateStack.pop());
				}
			} else if (currentState == IN_MULTILINE_COMMENT) {
				if (hasText(text, i, "--%>")) {
					i += 3;
					popParsingState(i, builder, stateStack, startTextInfoStack, parentStartStack);
				}
			} else if (currentState == IN_SECONDARY_MULTILINE_COMMENT) {
				if (hasText(text, i, "-->")) {
					i += 2;
					popParsingState(i, builder, stateStack, startTextInfoStack, parentStartStack);
				}
			} else if (isJava && c == '/' && nextChar(text, i) == '/') {
				pushNextLevelParsingState(IN_LINE_COMMENT, i, builder, stateStack, startTextInfoStack,
						parentStartStack);
				i++;
			} else if (isJava && c == '/' && nextChar(text, i) == '*') {
				pushNextLevelParsingState(IN_MULTILINE_COMMENT, i, builder, stateStack, startTextInfoStack,
						parentStartStack);
				i++;
			} else if (c == '"' && isTag) {
				pushNextLevelParsingState(IN_STRING_LITERAL, i, builder, stateStack, startTextInfoStack,
						parentStartStack);
			} else if (c == '\'' && isTag) {
				pushNextLevelParsingState(IN_SECONDARY_STRING_LITERAL, i, builder, stateStack,
						startTextInfoStack, parentStartStack);
			} else if (c == '"' && isJava) {
				pushParsingState(IN_STRING_LITERAL, i, stateStack, startTextInfoStack, parentStartStack);
			} else if (c == '\'' && isJava) {
				pushParsingState(IN_SECONDARY_STRING_LITERAL, i, stateStack, startTextInfoStack, parentStartStack);
			} else if (hasText(text, i, "<%--")) {
				pushParsingState(IN_MULTILINE_COMMENT, i, stateStack, startTextInfoStack, parentStartStack);
				i += 3;
			} else if (hasText(text, i, "<!--")) {
				pushParsingState(IN_SECONDARY_MULTILINE_COMMENT, i, stateStack, startTextInfoStack, parentStartStack);
				i += 3;
			} else if (hasText(text, i, "<%=")) {
				// In Java expression
				pushParsingState(IN_EXPRESSION_BLOCK, i, stateStack, startTextInfoStack, parentStartStack);
				i += 2;
				isJava = true;
			} else if (hasText(text, i, "<%!")) {
				pushParsingState(IN_EXPRESSION_BLOCK, i, stateStack, startTextInfoStack, parentStartStack);
				i += 2;
				isJava = true;
			} else if (hasText(text, i, "<%")) {
				// In Java scriptlet
				pushParsingState(IN_EXPRESSION_BLOCK, i, stateStack, startTextInfoStack, parentStartStack);
				i++;
				isJava = true;
			} else if (c == '<' && !isJava && !isTag) {
				pushParsingState(IN_TAG, i, stateStack, startTextInfoStack, parentStartStack);
				isTag = true;
			} else if (c == '>' && isTag && !isJava) {
				isTag = false;
				popParsingState(i, builder, stateStack, startTextInfoStack, parentStartStack);
			} else if (Character.isWhitespace(c)) {
				i = handleWhitespace(i, c, text, builder, stateStack, startTextInfoStack, parentStartStack);
			}
		}

		if (!stateStack.isEmpty()) {
			// TODO: what if there are multiple entries?
			// (this would suggest improperly formatted code)
			int startTextInfo = startTextInfoStack.pop();
			// TODO: does there need to be a parent?
			if (startTextInfo != text.length()) {
				builder.put(IntBEXRange.of(startTextInfo, text.length()), stateStack.pop());
			}
		}

		return builder.build();
	}
@CodesAway CodesAway added this to the 0.14 milestone Sep 19, 2021
@CodesAway CodesAway self-assigned this Sep 19, 2021
@CodesAway
Copy link
Owner Author

No unit tests were added, since I don't remember the scenario I changed thy code for... oops.

@CodesAway
Copy link
Owner Author

Nevermind, I already added unit tests and forgot, but noticed them when reviewing the changes for the pull merge. Yea!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant